Research | Tianfang Guo

Research Interests

Hardware Prefetching and Computer Architecture

I am relatively far away (although not so much so) from something of a completed manuscript for my thesis, so here is a collection of thoughts that provide a general direction for my intents.

A way to quantify prefetching is to compare existing hardware prefetchers to some magical oracle prefetcher. You can do this by analyzing the specific prefetch streams generated by said prefetchers. When you compare these streams, you can generally separate them into four sets:

Set 1: Prefetches made by hardware prefetchers that aren't made by the oracle. These aren't necessarily incorrect, but are deemed non-optimal by the oracle.
Set 2: Prefetches made by both. These are simply optimal prefetches and aren't very interesting.
Set 3: Prefetches made by the oracle that are theoretically possible for a hardware prefetcher to make. We just haven't designed one good enough to make them yet.
Set 4: Prefetches made by the oracle that are impossible for a hardware prefetcher to make. Likely because the oracle uses future information to make these, which is impossible because Intel doesn't know how to build a time machine.

It is easy to separate Sets 1, 2, and 3+4, but it's much harder to separate Set 3 from Set 4, which is the set we really care about, since Set 3 contains hard-to-make prefetches that can guide future prefetcher design.

The first-order concern is to actually create motivation to do this research. That is, is Set 3+4 actually large enough to justify putting effort into discovering its contents? There are multiple ways to do this, none of which seem (at least to me) particularly more correct than the others. The main point here is to quantify the difference between oracle streams and other prefetcher streams.

One way to do this is via compression, as a proxy for finding the Kolmogorov complexity of the streams. If you can compress a file well, it means there is less entropy in the file. I should add that the point of needing to calculate the entropy/k-complexity is that if a stream has a high amount of entropy, then it is naturally unpredictable. If the oracle stream had high entropy, it would mean the oracle is making seemingly random or unpredictable decisions on what to prefetch using its prescient information. However, according to my data so far, this is not the case—the oracle stream generally has one of the lower entropies across many benchmarks. This means the oracle is actually making relatively more patterned decisions than existing prefetchers. I am currently at this stage, which is to say mainly data collection and analysis.

Next comes the much harder part of actually separating Set 3 from Set 4. Again, there are multiple ways to do this:

Train an ML model: The rationale being that if the model can generate the prefetch, then it probably belongs in Set 3.
Create a compile-time dependency flow graph: You can somehow see if you ever get enough information in the form of instructions soon enough to make an accurate prefetch.
Vector projection (my personal idea): Project each stream as vectors onto some N-dimensional space, and then simply do vector arithmetic on those vectors. We can get the partial vectors that represent the difference between them, and that lets us separate the sets too.

Ultimately, I suspect what's going to happen is some combination of multiple of these methods (conservatively) acting as filters that generates some subset of Set 3.